85 research outputs found

    Moment-Matching Polynomials

    Full text link
    We give a new framework for proving the existence of low-degree, polynomial approximators for Boolean functions with respect to broad classes of non-product distributions. Our proofs use techniques related to the classical moment problem and deviate significantly from known Fourier-based methods, which require the underlying distribution to have some product structure. Our main application is the first polynomial-time algorithm for agnostically learning any function of a constant number of halfspaces with respect to any log-concave distribution (for any constant accuracy parameter). This result was not known even for the case of learning the intersection of two halfspaces without noise. Additionally, we show that in the "smoothed-analysis" setting, the above results hold with respect to distributions that have sub-exponential tails, a property satisfied by many natural and well-studied distributions in machine learning. Given that our algorithms can be implemented using Support Vector Machines (SVMs) with a polynomial kernel, these results give a rigorous theoretical explanation as to why many kernel methods work so well in practice

    Learning Graphical Models Using Multiplicative Weights

    Full text link
    We give a simple, multiplicative-weight update algorithm for learning undirected graphical models or Markov random fields (MRFs). The approach is new, and for the well-studied case of Ising models or Boltzmann machines, we obtain an algorithm that uses a nearly optimal number of samples and has quadratic running time (up to logarithmic factors), subsuming and improving on all prior work. Additionally, we give the first efficient algorithm for learning Ising models over general alphabets. Our main application is an algorithm for learning the structure of t-wise MRFs with nearly-optimal sample complexity (up to polynomial losses in necessary terms that depend on the weights) and running time that is nO(t)n^{O(t)}. In addition, given nO(t)n^{O(t)} samples, we can also learn the parameters of the model and generate a hypothesis that is close in statistical distance to the true MRF. All prior work runs in time nΩ(d)n^{\Omega(d)} for graphs of bounded degree d and does not generate a hypothesis close in statistical distance even for t=3. We observe that our runtime has the correct dependence on n and t assuming the hardness of learning sparse parities with noise. Our algorithm--the Sparsitron-- is easy to implement (has only one parameter) and holds in the on-line setting. Its analysis applies a regret bound from Freund and Schapire's classic Hedge algorithm. It also gives the first solution to the problem of learning sparse Generalized Linear Models (GLMs)

    Embedding Hard Learning Problems Into Gaussian Space

    Get PDF
    We give the first representation-independent hardness result for agnostically learning halfspaces with respect to the Gaussian distribution. We reduce from the problem of learning sparse parities with noise with respect to the uniform distribution on the hypercube (sparse LPN), a notoriously hard problem in theoretical computer science and show that any algorithm for agnostically learning halfspaces requires n^Omega(log(1/epsilon)) time under the assumption that k-sparse LPN requires n^Omega(k) time, ruling out a polynomial time algorithm for the problem. As far as we are aware, this is the first representation-independent hardness result for supervised learning when the underlying distribution is restricted to be a Gaussian. We also show that the problem of agnostically learning sparse polynomials with respect to the Gaussian distribution in polynomial time is as hard as PAC learning DNFs on the uniform distribution in polynomial time. This complements the surprising result of Andoni et. al. 2013 who show that sparse polynomials are learnable under random Gaussian noise in polynomial time. Taken together, these results show the inherent difficulty of designing supervised learning algorithms in Euclidean space even in the presence of strong distributional assumptions. Our results use a novel embedding of random labeled examples from the uniform distribution on the Boolean hypercube into random labeled examples from the Gaussian distribution that allows us to relate the hardness of learning problems on two different domains and distributions
    • …
    corecore